Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ssh tunnel support for jdbc connectors, issue #312 #5438

Closed

Conversation

airbyte-jenny
Copy link
Contributor

@airbyte-jenny airbyte-jenny commented Aug 16, 2021

What

Add ssh tunnel support to Java JDBC connectors.

How

Uses apache mina sshd client to connect through ssh, set up a tunnel, and port forward the database ports to where the connector can use them. Tunnel is initiated at a high level of abstraction and closed when it's no longer needed by that particular workflow.

The spec.json configuration required is injected during json deserialization, adding the ssh tunnel config to both the source and the destination for jdbc databases. That means that this feature becomes available to all jdbc sources and destinations without providing connector-specific config screen changes. The user of course does have to fill in specific port forward values per actual connection they create.

Recommended reading order

  1. SSHTunnel.java
  2. AbstractJdbcSource.java
  3. AbstractJdbcDestination.java
  4. Any integration tests for Postgres

Relates to #312

@github-actions github-actions bot added the area/connectors Connector related issues label Aug 16, 2021
@airbyte-jenny airbyte-jenny changed the title Airbyte jenny/connector ssh tunnels issue 312 Add ssh tunnel support for jdbc connectors, issue #312 Aug 19, 2021
Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tunneling for discovering the catalog for jdbc connectors also going to be supported as part of this PR?

I'd like to take another look once the closing pieces are finished.

@@ -5,6 +5,12 @@ plugins {

dependencies {
implementation 'commons-cli:commons-cli:1.4'
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0'
implementation 'org.apache.sshd:sshd-mina:2.7.0'

nit: we generally use this format

}

/**
* Extension point for child classes to add things to the spec json tree before it's deserialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Extension point for child classes to add things to the spec json tree before it's deserialized
* Extension point for child classes to modify the spec json tree before it's deserialized

* @param root
* @return root
*/
public JsonNode addToSpec(JsonNode root) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public JsonNode addToSpec(JsonNode root) throws IOException {
public JsonNode transformSpec(JsonNode root) throws IOException {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes sense to not expose this method in BaseConnector and instead just doing it on AbstractJdbcDestination and AbstractJdbcSource? I know it means that we will have to override spec() in those respective classes, so it probably a tiny bit more code change. On the plus side though, I think this transformation is going to be pretty specific to our JDBC sources and destinations, so exposing this transform just in those abstractions as opposed to all of our other connectors keeps the base a bit simpler.

@@ -42,7 +44,18 @@
public ConnectorSpecification spec() throws Exception {
// return a JsonSchema representation of the spec for the integration.
final String resourceString = MoreResources.readResource("spec.json");
return Jsons.deserialize(resourceString, ConnectorSpecification.class);
return Jsons.object(addToSpec(Jsons.deserialize(resourceString)), ConnectorSpecification.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Jsons.object(addToSpec(Jsons.deserialize(resourceString)), ConnectorSpecification.class);
return Jsons.object(transformSpec(Jsons.deserialize(resourceString)), ConnectorSpecification.class);

String remoteDatabasePort,
String tunnelDatabasePort) {
if (method == null) {
this.method = "NO_TUNNEL";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should us a constant for NO_TUNNEL since it's used in a couple places

this.tunnelDatabasePort = tunnelDatabasePort;
}

public static SSHTunnel getInstance(JsonNode config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the code is a bit confusing for the case where the SSHTunnel doesn't have a valid configuration.

Maybe it'd be easier to read if this was Optional<SSHTunnel> getInstance(...)? Then the calls to it could be done with tunnel.ifPresent(...) which might read a bit better to show that this is an optional feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead explicitly validate which case we're in (no tunnel/ssh key/pw auth) and do null checks/validation based on that?

/**
* Closes a tunnel if one was open, and otherwise doesn't do anything (safe to run).
*/
public void closeTunnel() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could actually be close() and the class could be AutoClosable so we could do this with a try-with-resources block instead of a try-finally block.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes LGTM - should tests go in a separate PR or this one?

this.tunnelDatabasePort = tunnelDatabasePort;
}

public static SSHTunnel getInstance(JsonNode config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead explicitly validate which case we're in (no tunnel/ssh key/pw auth) and do null checks/validation based on that?

* Encapsulates the connection configuration for an ssh tunnel port forward through a proxy/bastion
* host plus the remote host and remote port to forward to a specified local port.
*/
public class SSHTunnel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to extend autocloseable so we can use a try-with-resources?

"tunnel_method": {
"description": "No ssh tunnel needed to connect to database",
"type": "string",
"enum": ["NO_TUNNEL"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use const: "NO_TUNNEL" instead of enum/default like described here? enum/default shows two dropdowns to the user which is confusing.

Separately it might be better for the UI to name all these options without undesrcores e.g No Tunnel, SSH Key Auth, SSH Password Auth etc.. since it's more user friendly

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only doing comment because i think we are still waiting on integration tests.

@@ -11,6 +11,7 @@ application {
dependencies {
implementation 'com.google.cloud:google-cloud-storage:1.113.16'
implementation 'com.google.auth:google-auth-library-oauth2-http:0.25.5'
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0'
implementation 'org.apache.sshd:sshd-mina:2.7.0'

@@ -0,0 +1,167 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file the same as the one in airbyte-integrations/connectors/destinations-relational-db/src/main/resources/ssh-tunnel-spec.json? Is there any way we can DRY it out?

* @param root
* @return root
*/
public JsonNode addToSpec(JsonNode root) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it makes sense to not expose this method in BaseConnector and instead just doing it on AbstractJdbcDestination and AbstractJdbcSource? I know it means that we will have to override spec() in those respective classes, so it probably a tiny bit more code change. On the plus side though, I think this transformation is going to be pretty specific to our JDBC sources and destinations, so exposing this transform just in those abstractions as opposed to all of our other connectors keeps the base a bit simpler.

Comment on lines +221 to +223
if (getHost() == null) {
throw new RuntimeException("SSH Tunnel host is null - verify configuration before starting tunnel!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (getHost() == null) {
throw new RuntimeException("SSH Tunnel host is null - verify configuration before starting tunnel!");
}
Preconditions.notNull(getHost(), "SSH Tunnel host is null - verify configuration before starting tunnel!")

we tend to use Preconditions for validations like this. 1. theoretically takes fewer lines of code (though that's not always true) 2. gives you a more specific error message (in this case NPE)

Comment on lines 241 to 242
Integer.parseInt(
getTunnelSshPort().trim()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Integer.parseInt(
getTunnelSshPort().trim()))
Integer.parseInt(getTunnelSshPort().trim()))

Comment on lines +153 to +187
public String getMethod() {
return method;
}

public String getHost() {
return host;
}

public String getTunnelSshPort() {
return tunnelSshPort;
}

public String getUser() {
return user;
}

private String getSSHKey() {
return sshkey;
}

private String getPassword() {
return password;
}

public String getRemoteDatabaseHost() {
return remoteDatabaseHost;
}

public String getRemoteDatabasePort() {
return remoteDatabasePort;
}

public String getTunnelDatabasePort() {
return tunnelDatabasePort;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of exposing all of these getters publicly? should they at least be private? or since these variables are only used internally, can we just reference the fields directly?

@airbyte-jenny
Copy link
Contributor Author

airbyte-jenny commented Aug 26, 2021

Existing comments on the PR are probably mostly still relevant; I haven't had a chance to address them. Work is being paused due to timelines for a different feature. This one encountered a lot of complexity around the Source's read() method hierarchy, which is making testing more complex.

What still needs to be done includes identifying the root cause of the test failures for read (either no tunnel starts when needed, or tunnel tries to start twice and port-conflicts itself), get test case for postgres passing, and then go back and address the getSpec() method used for all jdbc connectors in tests, because they're assuming they don't have the ssh tunnel method spec.json nodes in there (but those are added at runtime, and thus the test must accommodate that). Right now that has to be done with duplicated code, because we also haven't addressed the structural issues with test class inheritance that would improve DRY for that.

Basically there are a bunch of structural issues that should be addressed by refactors, but I don't have the time right now to do them. These are notes for when I come back later to try to figure it out further.

To be clear: I expect that destination tests and source tests for non-postgres jdbc connectors are currently failing on testGetSpec() because the static json file they're comparing against doesn't have ssh tunnel configuration in it. I believe source changes and destination changes could be separated into different PRs, but we still have to address the testGetSpec() issue even for destinations.

* but for now that API spec doesn't support that, so we'll do it here.
* @throws Exception
*/
void dropUniqueSchema() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to use and ssh tunnel to interact with the database for setting up test fixtures? aren't we just making life harder for ourselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. because the test database we are using is only accessible via ssh.

@cgardens
Copy link
Contributor

cgardens commented Aug 27, 2021

@airbyte-jenny how were you testing this locally? was there a special docker image for postgres that you were using that had the bastion onboard? or were you just running on test infra?

@CLAassistant
Copy link

CLAassistant commented Sep 27, 2021

CLA assistant check
All committers have signed the CLA.

@sherifnada
Copy link
Contributor

closing since this was done in other PRs

@sherifnada sherifnada closed this Sep 29, 2021
@swyxio swyxio deleted the airbyte-jenny/connector_ssh_tunnels_issue_312 branch October 12, 2022 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants